home *** CD-ROM | disk | FTP | other *** search
-
- NAME
- strsed - ed(1)/tr(1)-like substitute and replace function
-
- SYNOPSIS
- char *strsed(string, command, 0)
- char *string;
- char *command;
-
- or
-
- char *strsed(string, command, range)
- char *string;
- char *command;
- int range[2];
-
- DESCRIPTION
- Strsed is a regular expression pattern match and replace
- function that also combines tr(1)-like transliteration. The GNU
- regex package is used for the regular expression matching.
-
- Strsed can be used to provide the functionality of most of the
- other more "complicated" string functions (e.g. strchr,
- strrchr, strpbrk, strspn, strcspn, and strtok), although less
- efficiently in each case, due to its generality. Strsed is a
- very powerful and general function that can be used to carry out
- complicated string manipulations such as those that are possible
- in text editors.
-
- USAGE
- String should be a null-terminated character string. A copy is
- made and will be operated on according to the search and replace
- instructions contained in command. Unless an error occurs (see
- ERRORS), the passed character strings string and command are
- never corrupted, and the string that is returned may always be
- passed to free(3) since its space is obtained from malloc(3).
-
- Both string and command may contain the following C-like escape
- sequences:
-
- \b Backspace.
- \f Formfeed.
- \n Newline.
- \r Carriage Return.
- \s Space.
- \t Horizontal Tab.
- \v Vertical Tab.
- \z Used to remove ambiguity if necessary.
- \0-9 A reference to a register.
- (except for \0 in a regular expression.)
- \0x3d The character whose value is 3d hexadecimal.
- \0X3d The character whose value is 3d hexadecimal.
- \040 The character whose value is 40 octal.
- \32 The character whose value is 32 decimal.
-
- The NUL (0) character cannot be specified. A ``\'' followed by
- one to three digits can be interpreted in several ways. If one
- or two hex digits are preceeded by an ``x'' or an ``X'', they
- will be taken as specifying a character in hexadecimal. If
- there are exactly three octal digits and the first is in the
- range ``0'' to ``3'' then they are taken as specifying a
- character in octal. Otherwise a single digit is taken to be a
- register reference and two or three digits are interpreted as
- specifying a character in decimal. \z can be used to avoid
- problems with ambiguity. For instance, \007 will be interpreted
- by strsed as octal 007. To specify the contents of register
- zero (\0) followed by the two characters ``07'', use \0\z07.
- The \z makes it clear what is meant (acting like a punctuation
- mark) and is otherwise ignored.
-
- Strsed allows ed(1) like regular expressions and substitutions
- on string. The search and replace command is specified by
- command. The format of command is either
-
- /search_pattern/replacement/
- or
- g/search_pattern/replacement/
-
- In the first form, the search and replace is performed once on
- the string, and in the second, the replacement is done globally
- (i.e. for every occurrence of the search pattern in string.). A
- leading ``s'' in the above is silently ignored. This allows for
- a syntax more like that of ed(1). e.g. s/e/x/ is the same as
- /e/x/.
-
- If replacement is empty, then the matched text will be replaced
- by nothing - i.e. deleted.
-
- Search_pattern is a full regular expression (see ed(1)),
- including register specifications (i.e. \( ... \)) and register
- references, (e.g. \2) but not the {m,n} repetition feature of
- ed(1).
-
- Replacement consists of ordinary characters and/or register
- references (e.g. \1 or \2). \0 means the entire matched text.
- In addition, a register reference may be immediately followed by
- a transliteration request, of the form
-
- {char-list-1}{char-list-2}.
-
- The characters from char-list-1 will be transliterated into the
- corresponding ones from char-list-2 in the same manner as
- tr(1). If the register reference before a transliteration
- request is omitted, it defaults to \0. Within a transliteration
- request, the characters "}" and "-" are metacharacters and must
- be escaped with a leading \ if you want them to be interpreted
- literally. Character ranges such as a-z are expanded in the
- same fashion as tr(1). If char-list-2 is shorter than
- char-list-1 then char-list-2 is padded to be the same length as
- char-list-1 by repeating its last character as many times as are
- needed. For example, the transliteration request
-
- {a-z}{X}
-
- will transliterate all lower case letters into an 'X'.
- Character ranges may be increasing or decreasing.
-
- Unusual character ranges (such as a-f-0- x2d-c) are interpreted
- as running from their first character to their last (so the
- above would be treated as a-c). Note that it is not possible
- (in this release) to specify the complement of a character range
- in a transliteration request. However, this can be done in the
- search_pattern by commencing a character class with a "^" in the
- normal regular expression fashion.
-
- The highest register that can be referenced is \9.
-
- EXAMPLES
- Here are some example command strings that might be given to
- strsed:
-
- /a/A/ # Change the first 'a' into an 'A'
- g/a/A/ # Change every 'a' into an 'A'
- g/:// # Delete every ':'
- g/jack/jill/ # Change every 'jack' to a 'jill'
- /[^\s\t]/X/ # Change the first non-whitespace
- # character into an 'X'.
-
- Some more advanced examples...
-
- /\([\s\t]*\)\([^\s\t]*\)/\1\2{a-z}{A-Z}/
-
- This converts the first non-whitespace word to upper case,
- preserving any initial whitespace. It catches the first run of
- spaces and TABs into register one \([\s\t]*\), and then the
- following run of non-white characters into register two
- \([^\s\t]*\). The replacement, \1\2{a-z}{A-Z} specifies
- register 1 (the whitespace) followed by the contents of register
- 2 transliterated into uppercase. This would produce
-
- " SPOTTED pinto bean"
-
- if called on the string
-
- " spotted pinto bean".
-
- g/\([a-z]\)\1+/\1/
-
- This is a very useful example and performs the same function as
- tr -s. That is, it squeezes runs of identical characters (in
- the range a to z) down to a single instance of that character.
- So "beeee good" becomes "be god". The "+" is the regular
- expression notation meaning "one or more".
-
- g/\([\t\s]*\)\(.\)\([^\t\s]*\)/\1\2{a-z}{A-Z}\3/
-
- This example capitalises the first letter of each word in the
- string, and preserves all whitespace. It catches three things,
-
- 1) the initial whitespace \([\t\s]*\) in register 1
- 2) the next letter \(.\) in register 2
- 3) the following nonwhite letters \([^\t\s]*\) in register 3
-
- and then prints them out as they were found, with the only
- difference being the uppercase conversion of the contents of
- register 2. Given the string
-
- " this is a line "
-
- this command would return
-
- " This Is A Line ".
-
- If the initial 'g' was not present in the command, then the
- capitalisation would only be done to the first word in the
- string. It is important to understand this difference well.
-
- SEARCHING ONLY
- Strsed may be used to search for a regular expression in a
- string, but perform no action. The portion of the string that
- matched will be returned in the third argument range. In this
- case command should be of the form /pattern/. On return,
- range[0] will contain an index into the original string to
- indicate where the match began, and range[1] will index the
- first character after the end of the match. For example, after
- the call
-
- strsed("two big macs please", "/b.*c/", range);
- range[0] will contain 4 and range[1] will contain 11. If not
- match is found, both elements of range will contain -1.
-
- ERRORS
- If strsed detects any error it returns NULL. This can happen if
- the syntax of command is incorrect, if the regular expression in
- command is incorrect, if space cannot be obtained from
- malloc(3), or for other similar reasons. Note that it is not an
- error if the empty string is returned.
-
- COMPILING AND LINKING STRSED
- Strsed should be compiled with the -O and -c options of your C
- compiler. It has no main() function. When you come to link,
- you use strsed.o and regex.o from the GNU 18.55 (or 18.54) emacs
- distribution.
-
- OBSCURE NOTE ON REGULAR EXPRESSIONS
- It is possible (but not too likely) that the regular expression
- language that is recognised may differ slightly from
- installation to installation. This is because the GNU regular
- expression package may compiled with different settings for
- recognition of meta-characters. So on one machine, the
- character "|" might be taken as being the OR operator, whilst
- somewhere else you need to give "\|" - or vice-versa. This
- could be a pain in the neck, but there's not alot that can be
- done about it. If you really need to know the difference in a
- portable way, look in regex.h to see what things are defined and
- then act accordingly when constructing commands for strsed.
-
- AUTHOR
- Terry Jones
- PCS Computer Systeme GmbH
- Pfaelzer-Wald-Str 36
- 8000 Muenchen 90
- West Germany 49-89-68004288
-
- terry@distel.pcs.com
- or ...!{pyramid,unido}!pcsbst!distel!terry
-
- January 8th, 1990.
-
- ACKNOWLEDGEMENTS
- Many thanks to Jordan K. (mother) Hubbard for discussions,
- bugfinding, handholding, forcing me to use emacs and torrents of
- (usually) uncalled-for abuse.
-
- SEE ALSO
- ed(1), tr(1)
-
-